Genome-wide analysis of R2R3-MYB transcription factors in Japanese morning glory

The R2R3-MYB transcription factor is one of the largest transcription factor families in plants. R2R3-MYBs play a variety of functions in plants, such as cell fate determination, organ and tissue differentiations, primary and secondary metabolisms, stress and defense responses and other physiological processes. The Japanese morning glory (Ipomoea nil) has been widely used as a model plant for flowering and morphological studies. In the present study, 127 R2R3-MYB genes were identified in the Japanese morning glory genome. Information, including gene structure, protein motif, chromosomal location and gene expression, were assigned to the InR2R3-MYBs. Phylogenetic tree analysis revealed that the 127 InR2R3-MYBs were classified into 29 subfamilies (C1-C29). Herein, physiological functions of the InR2R3-MYBs are discussed based on the functions of their Arabidopsis orthologues. InR2R3-MYBs in C9, C15, C16 or C28 may regulate cell division, flavonol biosynthesis, anthocyanin biosynthesis or response to abiotic stress, respectively. C16 harbors the known anthocyanin biosynthesis regulator, InMYB1 (INIL00g10723), and putative anthocyanin biosynthesis regulators, InMYB2 (INIL05g09650) and InMYB3 (INIL05g09651). In addition, INIL05g09649, INIL11g40874 and INIL11g40875 in C16 were suggested as novel anthocyanin biosynthesis regulators. We organized the R2R3-MYB transcription factors in the morning glory genome and assigned information to gene and protein structures and presuming their functions. Our study is expected to facilitate future research on R2R3-MYB transcription factors in Japanese morning glory.


Introduction
Transcription factors (TFs) are essential for the regulation of gene expression. Specific binding of TFs to cis-elements in promoter regions of genes activates or represses gene expression, thereby controlling various physiological events, such as tissue and organ developments, metabolic processes, and stress responses [1][2][3][4]. A large number of TF genes are present in the other floricultural plants, and TFs have been used as targets in molecular breeding to change flower color and morphology (http://www.cres-t.org/fiore/public_db/index.shtml) [46]. As described above, R2R3-MYBs play various important roles in plants, however, information on R2R3-MYBs in morning glory is currently limited. Clarification of R2R3-MYB functions in this plant is important for understanding of flowering, coloration and morphology, and other important traits of flowers. The Japanese morning glory 'Tokyo Kokei Standard line' genome (750 Mb) has been sequenced up to 98%, and scaffolds covering 91.42% of the assembly have been anchored to 15 pseudo-chromosomes [35]. In this study, we identified 126 genes encoding R2R3-MYB TFs in the Japanese morning glory genome and assigned and listed their information, such as gene ID, gene structure, protein motif, chromosomal location, gene expression profile, and physiological functions.

Multiple sequence alignment and phylogenic analysis
The amino acid sequences of R2R3-MYBs were aligned using the ClustalW program [51], and an unrooted neighbor-joining phylogenetic tree was constructed using MEGA X [52] with the following parameters: Poisson model, pairwise deletion and 1,000 bootstrap replications. The Japanese morning glory R2R3-MYBs were classified based on a boostrap value of 50 or higher. However, even if the boostrap value was below 50, R2R3-MYBs associated with a particular subgroup of Arabidopsis R2R3-MYB were treated as a single clade. R2R3-MYBs that did not meet this condition were not considered to belong to any clade.

Gene structure and protein motif analyses
Gene structure (exon-intron structure) was schematized with the coding sequences and the genomic sequence of the Japanese morning glory R2R3-MYBs by the Gene Structure Display Server (GSDS: http://gsds.gao-lab.org/) [53]. Multiple Expectation Maximization for Motif Elicitation (MEME: https://meme-suite.org/meme/tools/meme) [54] was used to identify the conserved protein motifs of the R2R3-MYBs, with the following parameters: the maximum number of motifs was set to identify 20 motifs and optimum width of motifs was set from six to 100 amino acids.

Chromosomal location and gene duplication analysis
Information on the chromosome distribution of the InR2R3-MYB genes was obtained from the Japanese morning glory genome database, while MapChart [55] was used for the graphical presentation of chromosomal location. Tandemly duplicated genes were defined as an array of two or more InR2R3-MYB genes falling within 100 kb of one another.

In silico gene expression analysis
Gene expression data for various organs of morning glory were downloaded from the Japanese morning glory genome database (http://viewer.shigen.info/asagao/jbrowse.php?data=data/ Asagao_1.2/) and converted to 10 logarithms. A heatmap was then created using R package gplots (https://cran.r-project.org/web/packages/gplots/index.html). Details of the samples were described in   [35]. Briefly, the embryo is immature green embryos; the flower includes fully opened flowers and flower buds at various stages; the leaf includes leaves of various sizes; the stem includes young stems with shoot tips; the seed includes seed coats at various developmental stages; the root is three-weeks-old roots.

Identification and classification of the morning glory R2R3-MYBs
To identify the morning glory R2R3-MYBs, we performed a BLAST search of the morning glory genome database (http://viewer.shigen.info/asagao/) using the MYB domain (PF00249) from Pfam and 126 Arabidopsis R2R3-MYBs [9] as queries. A total of 270 candidates were identified from the BLAST search. Subsequently, the presence of the MYB domains was confirmed by the Pfam, SMART and PROSITE. As a result, 126 R2R3-MYBs, harboring two MYB domains, were identified (Table 1 and S1 Table). Among them, three InR2R3-MYBs, InMYB1, InMYB2 and InMYB3, had been reported [42]. Although INIL05g09651, which has the highest homology to the InMYB3, harbors only one MYB domain, INIL05g09651 was considered to be identical with InMYB3, thus, it was included in R2R3-MYBs. Finally, total 127 R2R3-MYBs were identified in the morning glory genome. Forty 1R-MYBs (MYB-related proteins), three R1R2R3-MYBs (3R-MYBs) and one 4R-MYB harboring one, three or four MYB domains, respectively, were also listed in S2 Table. Arabidopsis R2R3-MYBs were classified into 23 functional subgroups (S1-S25) [9], some of which have been well characterized. To understand the evolutionary relationship between the R2R3-MYBs of morning glory and Arabidopsis and predict the functions of the morning glory R2R3-MYBs using those of Arabidopsis orthologues, a phylogenetic tree of the R2R3-MYBs was constructed (Fig 1). The phylogenetic trees revealed 29 subfamilies (C1-C29) of the morning glory R2R3-MYBs (InR2R3-MYBs). Eleven InR2R3-MYBs did not belong to any clade, while subfamily S12 of Arabidopsis R2R3-MYBs was absent in morning glory. Arabidopsis R2R3-MYBs belonging to S12 have been reported to regulate glucosinolate biosynthesis [56][57][58]. Glucosinolates are unique secondary metabolites in Brassicaceae [57], therefore morning glory has no homolog of Arabidopsis R2R3-MYBs in S12. On the other hand, C7 and C19 were found to be unique subfamilies in morning glory, suggesting that they might be responsible for unique functions in morning glory.

Consensus amino acid sequence in the MYB domains of InR2R3-MYBs
The MYB domains of R2R3-MYB contain highly conserved sequences [2]. To determine the consensus amino acid sequence in the MYB domains of InR2R3-MYBs, Fig 3 shows the sequence logos of the R2 and R3 repeats in the InR2R3-MYBs. Three regularly spaced tryptophan (W) residues in typical MYB domains are important for interaction with specific DNA sequences [8]. All InR2R3-MYBs, except INIL09g35855, had three W residues in the R2 repeat (Fig 3). In the R3 repeat, the first W residue is occasionally replaced by a hydrophobic amino acid, such as phenylalanine (F), isoleucine (I) or leucine (L), which is known for R2R3-MYBs in other plant species [11,17]. The second W residue in the R3 repeat was conserved in all InR2R3-MYBs, whereas the third W residue was conserved in most InR2R3-MYBs but not in INIL02g17103, INIL04g09009, INIL08g38640, INIL15g23810 (replaced by F), INIL11g18427 (replaced by tyrosine (Y)), and INIL11g40874 (replaced by cysteine (C)).
Conserved amino acid residues in the MYB domains were mainly distributed between the second and third conserved W residues in both R2 and R3 repeats (Fig 3). The region between the second and third W residues corresponds to helices 2 and helices 3 and their connecting loop (helix-turn-helix), and the region is important for binding to DNA [7,8]. In particular, the third helix of each repeat (recognition helix) is essential for the direct interaction with DNA [59]. Therefore, the third helix of the MYB domain in each repeat is highly conserved.

Chromosomal location of InR2R3-MYB genes
The chromosomal location of 127 InR2R3-MYB genes is shown in  densities of R2R3-MYBs were observed in both arms of Chr. 5 and Chr. 8. Most central regions of these chromosomes lacked InR2R3-MYBs.These trends are consistent with those in tomato [11] and potato [17]. Tandem duplications of InR2R3-MYBs in the morning glory genome were estimated following the method of Huang et al. (2012) [60], that is, two or more homologous genes in a 100 kb chromosome region were defined as tandem duplicated genes. As shown in Fig 4, INIL11g40874, INIL11g40875). These InR2R3-MYBs are thought to be the result of gene duplication.

Functions of InR2R3-MYBs
In general, paralogs and orthologues have similar functions, and subfamily members are likely to share a common evolutionary origin and similar functions. Therefore, the functions of the morning glory InR2R3-MYB belonging to the 29 subfamilies (C1-29) were estimated based on the known functions of Arabidopsis AtR2R3-MYBs in the 23 subfamilies (S1-S25) (Fig 1).

Helix 1 Helix 2 Helix 3
Helix 1 Helix 2 Helix 3 C5 of morning glory corresponds to S9 of Arabidopsis, which includes AtMYB16 and AtMYB106 (NOK). AtMYB16 and AtMYB106 are MIXTA-like proteins that regulate petal and leaf epidermal cell elongation, trichome formation, and cuticle formation [28,[61][62][63]. Therefore, six InR2R3-MYBs in C5 may be involved in the regulation of petal and leaf epidermal cell elongation, trichome formation, and cuticle formation in morning glory. Further, C9 of morning glory corresponds to S14 of Arabidopsis, which includes AtMYB37 (RAX2) and AtMYB84 (RAX3). These genes regulate lateral organ formation [64,65]. Therefore, members of C9 may be involved in the regulation of cell division, such as the development of lateral organ formation in Japanese morning glory.
Additionally, C16 of morning glory corresponds to S6 of Arabidopsis, which includes AtMYB75 (PAP1) and AtMYB90 (PAP2). PAP1 and PAP2 regulate anthocyanin biosynthesis [18]. SixInR2R3-MYBs present in C16 may regulate anthocyanin biosynthesis in the morning glory. The function of C16 is discussed in detail in the following section.
C28 of the morning glory corresponds to S22 of Arabidopsis, which includes AtMYB44, AtMYB70 and AtMYB73. The expression of these genes is induced by abiotic stresses, such as drought and wounding [66]. Therefore, the InR2R3-MYBs in C28 may be involved in the regulation of abiotic stress responses.

Gene expression and physiological functions of InR2R3-MYBs
To understand the organ-specific gene expression patterns of InR2R3-MYBs, the RNAseq data of InR2R3-MYBs in six tissues (embryo, flower, leaf, root, seed coat, and stem) were obtained from the Asagao Genome Database (http://viewer.shigen.info/asagao/). The data were projected on a heat map, as shown in Fig 5. Relatively high gene expression (RPKM>5) in all analyzed organs were observed in INIL04g32702, INIL11g18427 and INIL15g27998 (S5 Table). INIL04g32702 was homologous to AtMYB91 (AS1), which regulates leaf morphogenesis in Arabidopsis [67].  Table).
A number of InR2R3-MYBs showed organ-specific relatively high gene expression levels (RPKM>5  INIL05g09650) were highly expressed specifically in root or stem, respectively (S5 Table).
INIL02g11599 and INIL11g09839, which showed high and specific expression in flower, were in C22 and have high homology to Arabidopsis AtMYB35 (TDF1) or AtMYB21/24, respectively. AtMYB35 functions in the development and differentiation of tapetum tissue in anther [68]; therefore, INIL02g11599 may be involved in tapetum development. AtMYB21/24 regulates stamen filament development [69]; therefore INIL11g09839 is expected to be involved in stamen filament development. INIL12g01471, which was in the same clade with AtMYB17, was highly expressed in flower. AtMYB17 has been reported to regulate early inflorescence development [70]; therefore, INIL12g01471 may regulate early inflorescence development. INIL02g11914 and INIL05g09388, which have high homology to AtMYB20, showed high gene expression specifically in root. AtMYB20 negatively regulates drought stress response [71] and salt stress response [72], suggesting that INIL02g11914 and INIL05g09388 may regulate abiotic stress responses.

InR2R3-MYBs involves in anthocyanin biosynthesis
A well-known function of plant R2R3-MYBs is the regulation of anthocyanin biosynthesis, which is important in ornamental plants, including morning glory. In Japanese morning glory, InMYB1 have been reported to be involved in anthocyanin biosynthesis [42]. In addition, InMYB2 and InMYB3 have been reported as orthologs of petunia AN2, which regulates anthocyanin biosynthesis in petunia [42].
INIL05g09650 has a predicted transcript sequence matches the cDNA sequence of InMY-B2and thus identicalto InMYB2. InMYB2 is expressed in all tissues colored with anthocyanins other than petal [42]. According to the RNA-seq database, INIL05g09650 is expressed mostly in stems, which accumulate anthocyanins other than petals (Fig 5). This suggests that InMYB2 and INIL05g09650 are identical. INIL05g09649, which has high homology to INIL05g09650, was highly expressed in stems (Fig 5). Therefore, INIL05g09649 may be involved in the regulation of anthocyanin biosynthesis in the stem along with INIL05g09650.
INIL05g09651 has the highest homology to InMYB3. INIL05g09651 lacks R2 repeat and contains only R2 repeat in the TKS line used in the genome database of the Japanese morning glory, while Morita et al. (2006) [42] reported that InMYB3 in KK/ZSK-2 line contains both R2 and R3 repeats. INIL05g09651 of the TSK line has a stop codon after the region encoding the R2 repeat. This is considered an interspecific polymorphism (single nucleotide substitution), and InMYB3 (INIL05g09651) is considered to lose function in the TKS line.
InMYB1 is expressed specifically in petal and is involved in the regulation of petal coloration (anthocyanin accumulation) in morning glory [42]. The promoter of InMYB1 can be used as a petal-specific promoter [73][74][75]. INIL00g10723, which was not mapped to any pseudochromosome, but to a scaffold (Fig 4), has the highest homology to InMYB1. The upstream sequence of INIL00g10723 was identical to the promoter region of InMYB1. Therefore, INIL00g10723 and InMYB1 were considered to be identical. However, the amino acid sequences of C-terminus of INIL00g10723 and InMYB1 were not identical, and an additional sequence was present in INIL00g10723. Morita et al. (2006) [42] reported that InMYB1 has three exons, while four exons are predicted in INIL00g10723, and the additional sequence corresponds to exon 4. Thus, we checked the genomic sequence and RNA-seq data of INIL00g10723 on Japanese morning glory database, and found an identical sequence to the three exons of InMYB1, with a stop codon after exon 3 of INIL00g10723 (S2A and S2B Fig). Therefore, we concluded that the predicted coding sequence of INIL00g10723 was incorrect, and InMYB1 and INIL00g10723 are identical.
Both INIL11g40874 and INIL11g40875, which have high homology to INIL00g10723, showed petal-specific expression. The numbers of exons of these two genes differed from other C23 genes. Therefore, as with INIL00g10723, we checked the genomic sequences of The genomic sequence, including upstream and downstream regions, of INIL11g40874 is identical to that InMYB1 (INIL00g10723), except for exon 3 and its downstream. The nonidentical region corresponds to the linkage point of contigs and the sequence is considered to be erroneous. Therefore, we INIL00g10723, INIL11g40874 and InMYB1 may be identical gene on Chr. 11. Our final discussion of C16 is summarized in S3 Fig.

Conclusion
In this study, we performed genome-wide analysis of R2R3-MYB transcription factors in Japanese morning glory. A total of 126 InR2R3-MYBs were identified in the Japanese morning glory genome and their information, including gene structures, protein motifs and gene expression profiles, was collected. Our phylogenetic tree analysis revealed the presence of 29 subfamilies of InR2R3-MYBs, and the predicted functions of each subfamily have been discussed using gene expression profile and based on the functions of Arabidopsis AtR2R3-MYBs. This study provides essential and useful information for further functional and physiological studies on InR2R3-MYBs in morning glory.